Clinical prediction in defined populations: a simulation study investigating when and how to aggregate existing models

نویسندگان

  • Glen P. Martin
  • Mamas A. Mamas
  • Niels Peek
  • Iain Buchan
  • Matthew Sperrin
چکیده

BACKGROUND Clinical prediction models (CPMs) are increasingly deployed to support healthcare decisions but they are derived inconsistently, in part due to limited data. An emerging alternative is to aggregate existing CPMs developed for similar settings and outcomes. This simulation study aimed to investigate the impact of between-population-heterogeneity and sample size on aggregating existing CPMs in a defined population, compared with developing a model de novo. METHODS Simulations were designed to mimic a scenario in which multiple CPMs for a binary outcome had been derived in distinct, heterogeneous populations, with potentially different predictors available in each. We then generated a new 'local' population and compared the performance of CPMs developed for this population by aggregation, using stacked regression, principal component analysis or partial least squares, with redevelopment from scratch using backwards selection and penalised regression. RESULTS While redevelopment approaches resulted in models that were miscalibrated for local datasets of less than 500 observations, model aggregation methods were well calibrated across all simulation scenarios. When the size of local data was less than 1000 observations and between-population-heterogeneity was small, aggregating existing CPMs gave better discrimination and had the lowest mean square error in the predicted risks compared with deriving a new model. Conversely, given greater than 1000 observations and significant between-population-heterogeneity, then redevelopment outperformed the aggregation approaches. In all other scenarios, both aggregation and de novo derivation resulted in similar predictive performance. CONCLUSION This study demonstrates a pragmatic approach to contextualising CPMs to defined populations. When aiming to develop models in defined populations, modellers should consider existing CPMs, with aggregation approaches being a suitable modelling strategy particularly with sparse data on the local population.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predictive Ability of Statistical Genomic Prediction Methods When Underlying Genetic Architecture of Trait Is Purely Additive

A simulation study was conducted to address the issue of how purely additive (simple) genetic architecture might impact on the efficacy of parametric and non-parametric genomic prediction methods. For this purpose, we simulated a trait with narrow sense heritability h2= 0.3, with only additive genetic effects for 300 loci in order to compare the predictive ability of 14 more practically used ge...

متن کامل

A MULTI-OBJECTIVE OPTIMIZATION MODEL FOR PROJECT PORTFOLIO SELECTION CONSIDERING AGGREGATE COMPLEXITY: A CASE STUDY

Existing project selection models do not consider the complexity of projects as a selection criterion, while their complexity may prolong the project duration and even result in its failure. In addition, existing models cannot formulate the aggregate complexity of the selected projects. The aggregated complexity is not always equal to summation of complexity of projects because of possible syne...

متن کامل

Total and Partial efficiency indexes in data envelopment analysis

Introduction: Data envelopment analysis (DEA) is a data-oriented method for measuring and benchmarking the relative efficiency of peer decision making units (DMUs) with multiple inputs and multiple outputs. DEA was initiated in 1978 when Charnes, Cooper and Rhodes (CCR) demonstrated how to change a fractional linear measure of efficiency into a linear programming format. This non-parametric app...

متن کامل

مقایسه روش های مختلف آماری در انتخاب ژنومی گاوهای هلشتاین

Genomic selection combines statistical methods with genomic data to predict genetic values for complex traits.  The accuracy of prediction of genetic values ​​in selected population has a great effect on the success of this selection method. Accuracy of genomic prediction is highly dependent on the statistical model used to estimate marker effects in reference population. Various factors such a...

متن کامل

Designing and evaluation of a decision support system for prediction of coronary artery disease

Introduction: Since human health is the issue of Medical Research, correct prediction of results is of a high importance. This study applies probabilistic neural network (PNN) for predicting coronary artery disease (CAD), because the PNN is stronger than other methods. Methods: In this descriptive-analytic study, The PNN method was implemented on 150 patients admitted to the Mazandaran Heart...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2017